NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions

https://doi.org/10.21437/odyssey.2024-12

Dutta, Satwik; López-Espejo, Iván; Irvin, Dwight; Hansen, John_H L (June 2024, ISCA)

Bilingual children at a young age can benefit from exposure to dual language, impacting their language and literacy development. Speech technology can aid in developing tools to accurately quantify children’s exposure to multiple languages, thereby helping parents, teachers, and early-childhood practitioners to better support bilingual children. This study lays the foundation towards this goal using the Hoff corpus containing naturalistic adult-child bilingual interactions collected at child ages 2½, 3, and 3½ years. Exploiting self-supervised learning features from XLSR-53 and HuBERT, we jointly predict the language (English/Spanish) and speaker (adult/child) in each utterance using a multi-task learning approach. Our experiments indicate that a trainable linear combination of embeddings across all Transformer layers of the SSL models is a stronger indicator for both tasks with more benefit to speaker classification. However, language classification for children remains challenging.
more » « less
Full Text Available
Child-adult speech diarization in naturalistic conditions of preschool classrooms using room-independent ResNet model and automatic speech recognition-based re-segmentation

https://doi.org/10.1121/10.0024353

Kothalkar, Prasanna V; Hansen, John_H L; Irvin, Dwight; Buzhardt, Jay (February 2024, The Journal of the Acoustical Society of America)

Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates alternate deep learning-based lightweight, knowledge-distilled, diarization solutions for segmenting classroom interactions of 3–5 years old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our lightest CNN model achieves a best F1-score of ∼76.0% on data from two classrooms, based on dev and test sets of each classroom. It is utilized with automatic speech recognition-based re-segmentation modules to perform child-adult diarization. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations.
more » « less
Full Text Available
Can Smartphones be a cost-effective alternative to LENA for Early Childhood Language Intervention?

https://doi.org/10.21437/S4SG.2022-3

Dutta, Satwik; Reyna, Jacob C.; Buzhardt, Jay F.; Irvin, Dwight; Hansen, John H.L. (September 2022, Workshop on Speech for Social Good (S4SG))

Although non-profit commercial products such as LENA can provide valuable feedback to parents and early childhood educators about their children’s or student’s daily communication interactions, their cost and technology requirements put them out of reach of many families who could benefit. Over the last two decades, smartphones have become commonly used in most households irrespective of their socio-economic background. In this study, conducted during the COVID-19 pandemic, we aim to compare audio collected on LENA recorders versus smartphones available to families in an unsupervised data collection protocol. Approximately 10 hours of audio evaluated in this study was collected by three families in their homes during parent-child science book reading activities with their children. We report comparisons and found similar performance between the two audio capture devices based on their speech signal-tonoise ratio (NIST STNR) and word-error-rates calculated using automatic speech recognition (ASR) engines. Finally, we discuss implications of this study for expanding this technology to more diverse populations, limitations and future directions.
more » « less
Full Text Available
Assessing child communication engagement and statistical speech patterns for American English via speech recognition in naturalistic active learning spaces

https://doi.org/10.1016/j.specom.2022.01.006

Lileikyte, Rasa; Irvin, Dwight; Hansen, John H.L. (May 2022, Speech Communication)

Full Text Available
Children's social preference for teachers versus peers in autism inclusion classrooms: An objective perspective

https://doi.org/10.1002/aur.3276

Drye, Madison; Banarjee, Chitra; Perry, Lynn; Viggiano, Alyssa; Irvin, Dwight; Messinger, Daniel (December 2024, Autism Research)

Abstract In inclusive preschools, children with autism spectrum disorder (ASD) and other developmental disabilities (DD) are less socially engaged with peers than are typically developing (TD) children. However, there is limited objective information describing how children with ASD engage with teachers, or how teacher engagement compares to engagement with peers. We tracked over 750 hours' worth of children's (N = 77;N_ASD = 24,N_DD = 23,N_TD = 30;M_age = 43.98 months) and teachers' (N = 12) locations and orientations across eight inclusion preschool classrooms to quantify child‐teacher and child‐peer social preference. Social approach velocity and time in social contact were computed for each child and compared across social partners to index children's preference for teachers over peers. Children with ASD approached teachers–‐but not peers—more quickly than children with TD, and children with ASD were approached more quickly by teachers and more slowly by peers than children with TD. Children with ASD spent less time in social contact with peers and did not differ from children with TD in their time in social contact with teachers. Overall, children with ASD showed a greater preference for approaching, being approached by, and being in social contact with teachers (relative to peers) than children with TD. No significant differences emerged between children with DD and children with TD. In conclusion, children with ASD exhibited a stronger preference for engaging with teachers over peers, re‐emphasizing the need for classroom‐based interventions that support the peer interactions of children with ASD.
more » « less
Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments

https://doi.org/10.21437/Interspeech.2022-555

Dutta, Satwik; Tao, Sarah Anne; Reyna, Jacob C.; Hacker, Rebecca Elizabeth; Irvin, Dwight W.; Buzhardt, Jay F.; Hansen, John H.L. (September 2022, ISCA INTERSPEECH-2022)

Monitoring child development in terms of speech/language skills has a long-term impact on their overall growth. As student diversity continues to expand in US classrooms, there is a growing need to benchmark social-communication engagement, both from a teacher-student perspective, as well as student-student content. Given various challenges with direct observation, deploying speech technology will assist in extracting meaningful information for teachers. These will help teachers to identify and respond to students in need, immediately impacting their early learning and interest. This study takes a deep dive into exploring various hybrid ASR solutions for low-resource spontaneous preschool (3-5yrs) children (with & without developmental delays) speech, being involved in various activities, and interacting with teachers and peers in naturalistic classrooms. Various out-of-domain corpora over a wide and limited age range, both scripted and spontaneous were considered. Acoustic models based on factorized TDNNs infused with Attention, and both N-gram and RNN language models were considered. Results indicate that young children have significantly different/ developing articulation skills as compared to older children. Out-of-domain transcripts of interactions between young children and adults however enhance language model performance. Overall transcription of such data, including various non-linguistic markers, poses additional challenges.
more » « less
Full Text Available
Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces

https://doi.org/10.21437/Odyssey.2020-56

Lileikyte, Rasa; Irvin, Dwight; Hansen, John H. (November 2020, ISCA ODYSSEY-2020)

The ability to assess children’s conversational interaction is critical in determining language and cognitive proficiency for typically developing and at-risk children. The earlier at-risk child is identified, the earlier support can be provided to reduce the social impact of the speech disorder. To date, limited research has been performed for young child speech recognition in classroom settings. This study addresses speech recognition research with naturalistic children’s speech, where age varies from 2.5 to 5 years. Data augmentation is relatively under explored for child speech. Therefore, we investigate the effectiveness of data augmentation techniques to improve both language and acoustic models. We explore alternate text augmentation approaches using adult data, Web data, and via text generated by recurrent neural networks. We also compare several acoustic augmentation techniques: speed perturbation, tempo perturbation, and adult data. Finally, we comment on child word count rates to assess child speech development.
more » « less
Full Text Available
Tagging child-adult interactions in naturalistic, noisy, daylong school environments using i-vector based diarization system

https://doi.org/10.21437/SLaTE.2019-17

Kothalkar, Prasanna V.; Irvin, Dwight; Luo, Ying; Rojas, Joanne; Nash, John; Rous, Beth; Hansen, John H. (August 2020, ISCA SLaTE-2019 Workshop)

Assessing child growth in terms of speech and language is a crucial indicator of long term learning ability and life-long progress. Since the preschool classroom provides a potent opportunity for monitoring growth in young children’s interactions, analyzing such data has come into prominence for early childhood researchers. The foremost task of any analysis of such naturalistic recordings would involve parsing and tagging the interactions between adults and young children. An automated tagging system will provide child interaction metrics and would be important for any further processing. This study investigates the language environment of 3-5 year old children using a CRSS based diarization strategy employing an i-vector-based baseline that captures adult-to-child or childto- child rapid conversational turns in a naturalistic noisy early childhood setting. We provide analysis of various loss functions and learning algorithms using Deep Neural Networks to separate child speech from adult speech. Performance is measured in terms of diarization error rate, Jaccard error rate and shows good results for tagging adult vs children’s speech. Distinction between primary and secondary child would be useful for monitoring a given child and analysis is provided for the same. Our diarization system provides insights into the direction for preprocessing and analyzing challenging naturalistic daylong child speech recordings.
more » « less
Full Text Available
Speech and language processing for assessing child–adult interaction based on diarization and location

https://doi.org/10.1007/s10772-019-09590-0

Hansen, John H.; Najafian, Maryam; Lileikyte, Rasa; Irvin, Dwight; Rous, Beth (September 2019, International Journal of Speech Technology)

Full Text Available

Search for: All records